Update GRROsqueryCollector to use threadpoolexecutor #696

sydp · 2022-12-22T11:12:53Z

As suggested, using threadpoolexecutor to run osquery in multiple threads.

dftimewolf/lib/collectors/grr_hosts.py

tomchop · 2022-12-28T09:13:06Z

dftimewolf/lib/collectors/grr_hosts.py

+      results_container = containers.OsqueryResult(
+          name=name,
+          description=description,
+          query=query,
+          hostname=hostname,
+          data_frame=pd.DataFrame(),
+          flow_identifier=flow_identifier,
+          client_identifier=client_identifier)
+      self.state.StoreContainer(results_container)


What's the point in storing an empty dataframe here? Wouldn't it be better just to not store any container?

I have it as an empty dataframe as the corresponding container attribute is currently not optional. It also simplifies the logic in downstream processing of the container.

My question was more "why add a container at all" if the dataframe is going to be empty anyways.

Oops my bad, missed the second question. My rationale doing it this way was no result (i.e. empty data) is still a result and is useful feedback downstream to let the module/user know that the query was successful and there was no result.

OK, that makes sense, thanks!

Co-authored-by: Thomas Chopitea <[email protected]>

tomchop · 2023-01-04T12:28:04Z

dftimewolf/lib/collectors/grr_hosts.py

+      results_container = containers.OsqueryResult(
+          name=name,
+          description=description,
+          query=query,
+          hostname=hostname,
+          data_frame=pd.DataFrame(),
+          flow_identifier=flow_identifier,
+          client_identifier=client_identifier)
+      self.state.StoreContainer(results_container)


OK, that makes sense, thanks!

Redone to use threadpoolexecutor

6a1e16b

sydp mentioned this pull request Dec 22, 2022

[WIP] Move osquery flow creation to PreProcess() #690

Closed

Unused import

15138b1

sydp marked this pull request as ready for review December 22, 2022 11:26

sydp self-assigned this Dec 22, 2022

sydp requested review from ramo-j and tomchop December 22, 2022 11:27

tomchop requested changes Dec 22, 2022

View reviewed changes

dftimewolf/lib/collectors/grr_hosts.py Outdated Show resolved Hide resolved

dftimewolf/lib/collectors/grr_hosts.py Outdated Show resolved Hide resolved

dftimewolf/lib/collectors/grr_hosts.py Show resolved Hide resolved

dftimewolf/lib/collectors/grr_hosts.py Outdated Show resolved Hide resolved

Updates per review

69fed93

sydp mentioned this pull request Dec 27, 2022

Handle GRR Flow timeouts better #698

Open

sydp requested a review from tomchop December 27, 2022 20:11

tomchop requested changes Dec 28, 2022

View reviewed changes

sydp and others added 2 commits December 29, 2022 11:52

Update dftimewolf/lib/collectors/grr_hosts.py

92b3d52

Co-authored-by: Thomas Chopitea <[email protected]>

Merge branch 'main' into osquery_threadpoolexecutor

228d563

tomchop approved these changes Jan 4, 2023

View reviewed changes

tomchop merged commit 0faedec into log2timeline:main Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update GRROsqueryCollector to use threadpoolexecutor #696

Update GRROsqueryCollector to use threadpoolexecutor #696

sydp commented Dec 22, 2022

tomchop Dec 28, 2022

sydp Dec 29, 2022

tomchop Dec 30, 2022

sydp Dec 30, 2022 •

edited

Loading

tomchop Jan 4, 2023

tomchop Jan 4, 2023

Update GRROsqueryCollector to use threadpoolexecutor #696

Update GRROsqueryCollector to use threadpoolexecutor #696

Conversation

sydp commented Dec 22, 2022

tomchop Dec 28, 2022

Choose a reason for hiding this comment

sydp Dec 29, 2022

Choose a reason for hiding this comment

tomchop Dec 30, 2022

Choose a reason for hiding this comment

sydp Dec 30, 2022 • edited Loading

Choose a reason for hiding this comment

tomchop Jan 4, 2023

Choose a reason for hiding this comment

tomchop Jan 4, 2023

Choose a reason for hiding this comment

sydp Dec 30, 2022 •

edited

Loading